A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)
نویسندگان
چکیده
This work provides a simplified proof of the statistical minimax optimality of (iterate averaged) stochastic gradient descent (SGD), for the special case of least squares. This result is obtained by analyzing SGD as a stochastic process and by sharply characterizing the stationary covariance matrix of this process. The finite rate optimality characterization captures the constant factors and addresses model mis-specification.
منابع مشابه
Accelerating Stochastic Gradient Descent
There is widespread sentiment that fast gradient methods (e.g. Nesterov’s acceleration, conjugate gradient, heavy ball) are not effective for the purposes of stochastic optimization due to their instability and error accumulation. Numerous works have attempted to quantify these instabilities in the face of either statistical or non-statistical errors (Paige, 1971; Proakis, 1974; Polyak, 1987; G...
متن کاملDiffusion Approximations for Online Principal Component Estimation and Global Convergence
In this paper, we propose to adopt the diffusion approximation tools to study the dynamics of Oja’s iteration which is an online stochastic gradient descent method for the principal component analysis. Oja’s iteration maintains a running estimate of the true principal component from streaming data and enjoys less temporal and spatial complexities. We show that the Oja’s iteration for the top ei...
متن کاملLeast-Squares Halftoning via Human Vision System and Markov Gradient Descent (LS-MGD): Algorithm and Analysis
Halftoning is the core algorithm governing most digital printing or imaging devices, by which images of continuous tones are converted to ensembles of discrete or quantum dots. It is through the human vision system (HVS) that such fields of quantum dots can be perceived almost identical to the original continuous images. In the current work, we propose a least-square based halftoning model with...
متن کاملSparse model identification using orthogonal forward regression with basis pursuit and D-optimality - Control Theory and Applications, IEE Proceedings-
An efficient model identification algorithm for a large class of linear-in-the-parameters models is introduced that simultaneously optimises the model approximation ability, sparsity and robustness. The derived model parameters in each forward regression step are initially estimated via the orthogonal least squares (OLS), followed by being tuned with a new gradient-descent learning algorithm ba...
متن کاملMapping Activity Diagram to Petri Net: Application of Markov Theory for Analyzing Non-Functional Parameters
The quality of an architectural design of a software system has a great influence on achieving non-functional requirements of a system. A regular software development project is often influenced by non-functional factors such as the customers' expectations about the performance and reliability of the software as well as the reduction of underlying risks. The evaluation of non-functional paramet...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017